A framework for scalable, parallel performance monitoring

نویسندگان

  • Aroon Nataraj
  • Allen D. Malony
  • Alan Morris
  • Dorian C. Arnold
  • Barton P. Miller
چکیده

Performance monitoring of HPC applications offers opportunities for adaptive optimization based on dynamic performance behavior, unavailable in purely post-mortem performance views. However, a parallel performance monitoring system must have low overhead and high efficiency to make these opportunities tangible. We describe a scalable parallel performance monitor called TAUoverMRNet (ToM), created from the integration of the TAU performance system and the Multicast Reduction Network (MRNet). The integration is achieved through a plug-in architecture in TAU that allows selection of different transport substrates to offload online performance data. A method to establish the transport overlay structure of the monitor from within TAU, one that requires no added support from the job manager or application, is presented. We demonstrate the distribution of performance analysis from the sink to the overlay nodes and the reduction in large-scale profile data that could otherwise overwhelm any single sink. Results show low perturbation and significant savings accrued from reduction at large processor-counts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Scalable, Parallel Performance Monitoring using TAU and MRNet

Performance monitoring of HPC applications offers opportunities for adaptive optimization based on dynamic performance behavior, unavailable in purely post-mortem performance views. However, a parallel performance monitoring system must have low overhead and high efficiency to make these opportunities tangible. We describe a scalable parallel performance monitor called TAUoverMRNet (ToM), creat...

متن کامل

Parallel computation framework for optimizing trailer routes in bulk transportation

We consider a rich tanker trailer routing problem with stochastic transit times for chemicals and liquid bulk orders. A typical route of the tanker trailer comprises of sourcing a cleaned and prepped trailer from a pre-wash location, pickup and delivery of chemical orders, cleaning the tanker trailer at a post-wash location after order delivery and prepping for the next order. Unlike traditiona...

متن کامل

Semi-on-line Monitoring of P-GRADE Applications

GRM and PROVE are the monitoring and performance visualisation tools of the P-GRADE graphical parallel program development and execution environment running on clusters. With semion-line monitoring application behaviour data can be requested by the analysis tool without being a real on-line tool. This paper discusses the semi-on-line monitoring method and its use in the performance analysis of ...

متن کامل

Scalable Process Monitoring through Rules and Neural Networks

In this paper we introduce RuleRunner, a Runtime Verification system for monitoring LTL properties over finite traces. By exploiting results from the Neural-Symbolic Integration area, a RuleRunner monitor can be encoded in a recurrent neural network. The results show that neural networks can perform real-time runtime verification and techniques of parallel computing can be applied to improve th...

متن کامل

A Scalable and Performant Grid Monitoring and Information Framework

Distributed resource property repositories and state monitoring systems are critical components of any Grid Management Architecture, providing Grid scheduler, job/execution manager and state estimation components with accurate information about network, computational and storage resource properties and status. Without an upto-date information and monitoring service, intelligent scheduling decis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2010